RAG
- Overview
- Techniques
Retrieval-Augmented Generation (RAG) is an advanced technique in natural language processing that combines the strengths of retrieval-based models and generative models to produce more accurate and contextually relevant responses. RAG systems first retrieve relevant documents or information from a large corpus based on the input query, and then use this retrieved information to generate a coherent and informed response. This approach enhances the model's ability to provide detailed and accurate answers, especially in scenarios where the input query requires specific knowledge that may not be present in the training data of the generative model alone.
Key Components:
- Retriever: component is responsible for searching and retrieving relevant documents or information from a large corpus based on the input query. It can use various techniques such as keyword matching, semantic search, or more advanced methods like dense vector representations to find the most pertinent information
- Generator: component takes the retrieved information and uses it to produce a coherent and contextually relevant response. This component is typically a generative language model that can synthesize information, answer questions, or create content based on the input it receives from the retriever
Type | Technique | Visualization | Definition | Technical Details | Pros | Cons | Use Cases |
---|---|---|---|---|---|---|---|
Foundational | Basic RAG | simple RAG combines a retriever and a generative language model to augment responses with external knowledge from a retrievable dataset | uses an embedding model to vectorize queries and documents, a vector store for similarity search, and a language model that conditions generation on retrieved relevant documents | straightforward to implement, improves accuracy over standalone LLM, allows use of up-to-date knowledge | performance depends heavily on retrieval quality, lacks advanced context handling | basic Q&A systems, chatbots, knowledge-based assistance | |
Foundational | Reliable RAG | Enhanced RAG that validates retrieved document relevance and highlights used segments | Adds validation layer to check document relevancy and highlights specific segments used for answering | Improves answer reliability, provides transparency, reduces hallucinations | Additional processing overhead, requires relevance validation logic | High-stakes Q&A, legal document analysis, medical information retrieval | |
Foundational | Optimizing Chunk Sizes | Systematic approach to finding optimal chunk sizes for document processing | Experiments with different chunk sizes and evaluates performance using metrics like faithfulness and relevancy | Optimizes retrieval quality, balances context preservation vs precision | Requires experimentation, dataset-dependent optimal sizes | Performance tuning, domain-specific optimization, research and benchmarking | |
Foundational | Proposition Chunking | Breaking down text into concise, complete, meaningful propositions for better retrieval | Uses LLM to generate factual statements from chunks, validates quality, embeds propositions for retrieval | Improves retrieval precision, better for knowledge extraction, reduces context fragmentation | Computationally expensive, requires quality validation | Knowledge extraction, precise fact retrieval, academic research | |
Query Enhancement | Query Transformation | Modifying or decomposing queries before retrieval to improve results | Techniques include rephrasing, question decomposition, routing queries to appropriate sub-engines | Handles complex queries better; improves relevance and answer quality | Increased pipeline complexity; needs query understanding modules | Complex multi-part questions, multi-domain retrieval systems | |
Query Enhancement | Reranker | A second-stage model that refines and improves initial retrieval results ranking | Typically neural cross-encoders that re-score retrieved candidates based on query-document interactions | Improves retrieval precision; reduces noise in top results | Adds latency and compute overhead | Search engines, QA systems, improving initial retriever results | |
Query Enhancement | Hypothetical Document Embedding (HyDE) | Generating hypothetical answers or documents with LLMs to create embeddings for retrieval | LLMs synthesize potential relevant info from query, which is then embedded for comparison with documents | Improves recall by anticipating relevant content; integrates generation and retrieval | Relies on LLM quality; extra compute cost | Open-domain QA, exploratory search | |
Query Enhancement | HyPE (Hypothetical Prompt Embedding) | Precomputes hypothetical questions per chunk at indexing time for better query alignment | Generates multiple hypothetical queries per chunk, embeds questions instead of chunks, matches user queries against stored questions | Improves retrieval alignment, no runtime overhead, higher precision and recall | Requires more storage, complex indexing process | Question-answering systems, information retrieval, search optimization | |
Context Enrichment | Semantic Chunking | splitting documents into semantically meaningful chunks to preserve coherent ideas | instead of fixed-size splits, chunks are formed at natural linguistic or thematic boundaries like paragraphs or concept boundaries to improve embedding quality | improves retrieval relevance; avoids fragmented or mixed-topic chunks; boosts RAG output coherence | requires sophisticated parsing; computationally more expensive than naive chunking | complex document understanding, technical and legal document QA | |
Context Enrichment | Contextual Chunk Headers | Adding chunk headers with high-level context (doc titles, section names) to chunk text for better retrieval | Concatenate chunk header and text before embedding to help retrieval and reranking models disambiguate meaning | Significantly improves retrieval accuracy by resolving pronouns and implicit references | Needs reliable header extraction and management | Document-heavy knowledge bases, technical manuals, multi-section documents | |
Context Enrichment | Document Augmentation | Techniques to augment source documents or their representations to enhance model understanding | Includes methods like obscuring, paraphrasing, or re-rendering document sections to increase robustness to noise and variance | Improves model robustness to document variations; helps multimodal systems | Computationally expensive; risks semantic drift if over-augmented | Document image understanding, noisy document processing, training data diversification | |
Context Enrichment | Relevant Segment Extraction (RSE) | Merging adjacent or related chunks at query time to form more coherent retrieved segments | Dynamic post-processing merges top-ranked chunks if they form a continuous or thematically linked segment | Provides more complete, contextual answers; reduces fragmentation | Requires effective chunk adjacency or relationship detection | Legal and regulatory texts, long documents with cross references | |
Context Enrichment | Contextual Compression | Compressing retrieved context to remove irrelevant parts while preserving meaning | Techniques like summarization, pruning irrelevant sentences from retrieved chunks before generation | Controls input size, reduces noise, improves generation quality | Risk of losing important context if not careful | Systems with strict token limits; summarization-augmented retrieval | |
Context Enrichment | Context Window Enhancement | Enhancing retrieval by embedding individual sentences and including neighboring context | Embeds sentences individually, retrieves most relevant sentence plus surrounding context | Better context preservation, improves answer coherence | Increased token usage, more complex retrieval logic | Long document analysis, contextual Q&A, detailed explanations | |
Advanced Retrieval | Hierarchical Indices | Multi-level indexing of documents, from coarse to fine granularity, for efficient retrieval | First retrieve from high-level indices, then refine search in sub-indices or finer chunks | Scalable; efficient retrieval in large corpora | Indexing overhead; complex querying logic | Large-scale document corpora, layered knowledge bases | |
Advanced Retrieval | Fusion | Combining outputs from multiple retrievers or models to improve retrieval and generation | Could involve merging retrieval results, ensemble of models, or aggregated embeddings | More robust retrieval; improved accuracy | Increased complexity and computation | Multi-source data retrieval, hybrid search systems | |
Advanced Retrieval | Multi Model | Using multiple models with distinct capabilities within a RAG pipeline | E.g., combining separate retriever models, language models, or rerankers specialized on different data or tasks | Leverages model strengths; improves versatility | System complexity; higher resource needs | Cross-domain QA, multi-modal retrieval | |
Advanced Retrieval | Corrective RAG (CRAG) | A RAG variant that corrects its own retrieval or generation errors via an iterative or corrective process | Involves mechanisms to detect mistakes and refine retrieval or generation outputs automatically | Increases accuracy; reduces hallucination | Iterative steps increase latency; complexity to implement | High-accuracy domains like legal, medicine, or scientific research | |
Advanced Retrieval | Multi-faceted Filtering | Applying multiple filtering techniques to refine and improve retrieval results | Filters based on metadata, similarity thresholds, content criteria, and diversity | Improves result quality, reduces noise, ensures diversity | May filter out relevant results, requires tuning | Enterprise search, content curation, recommendation systems | |
Advanced Retrieval | Ensemble Retrieval | Combining multiple retrieval models or techniques for more robust results | Uses different embedding models or algorithms with voting/weighting mechanisms | More robust and accurate results, reduces individual model biases | Increased computational cost, more complex implementation | High-accuracy systems, production environments, critical applications | |
Advanced Retrieval | Dartboard Retrieval | Optimizing retrieval for both relevance and diversity using combined scoring | Combines relevance and diversity into single scoring function, selects documents that maximize information gain | Better coverage of information, reduces redundancy, improves overall retrieval quality | More complex algorithm, requires distance calculations between documents | Dense knowledge bases, comprehensive search, exploratory queries | |
Advanced Retrieval | Multi-modal RAG with Captioning | RAG system that handles both text and images by generating captions for images | Extracts images from documents, generates captions, combines with text for unified retrieval | Handles multi-modal content, improves search over visual content | Requires image processing, additional computational cost | Document-heavy applications, research papers with figures, multi-modal search | |
Iterative Techniques | Feedback Loop | Using system or user feedback to iteratively improve retrieval and generation | Incorporating relevance judgments, user clicks, or corrections back into retriever or retrain pipeline | Improves system accuracy over time; adapts to user needs | Requires feedback collection infrastructure and ongoing maintenance | Interactive assistants, adaptive QA systems | |
Iterative Techniques | Adaptive RAG | Dynamically adjusting RAG retrieval or generation parameters based on query or context | Could involve selecting different retrieval strategies or varying chunk sizes adaptively | Tailors retrieval to varying user queries; boosts efficiency and relevance | More complex system design | Systems with varied query complexity, multi-domain applications | |
Iterative Techniques | Adaptive Retrieval | Dynamically adjusting retrieval strategies based on query types and user contexts | Classifies queries and uses tailored retrieval strategies for each type | Better handles different query types, improves relevance | Requires query classification, more complex system | Dynamic Q&A systems, personalized assistants, varied query handling | |
Iterative Retrieval | Iterative Retrieval | Performing multiple rounds of retrieval to refine and enhance result quality | Uses LLM to analyze initial results and generate follow-up queries to fill gaps | Improves answer completeness, handles complex queries better | Increased latency, more API calls | Complex research questions, multi-step reasoning, comprehensive analysis | |
Evaluation | DeepEval | Comprehensive evaluation framework for RAG systems using multiple metrics | Evaluates correctness, faithfulness, and contextual relevancy of RAG responses | Provides detailed performance metrics, helps identify system weaknesses | Requires ground truth data, computational overhead | RAG system development, performance benchmarking, quality assurance | |
Evaluation | GroUSE | Contextually-grounded LLM evaluation framework with multiple metrics | Evaluates LLM generations using 6 specific metrics for contextual grounding | Provides detailed evaluation of context usage, helps improve RAG systems | Requires specific evaluation setup, metric interpretation needed | RAG evaluation, context-aware assessment, LLM performance analysis | |
Explainability | Explainable Retrieval | Providing transparency in the retrieval process to enhance user trust | Explains why certain documents were retrieved and how they relate to the query | Increases user trust, helps with system refinement, provides debugging insights | Adds computational overhead, requires explanation generation | Transparent AI systems, user-facing applications, system debugging | |
Advanced Architecture | Self RAG | A technique where the generative model itself guides or performs retrieval to augment its own generation | Iteratively generates or refines queries to retrieve knowledge chunks, then conditions on them | Tight coupling of retrieval and generation; can improve contextual coherence | Computationally intensive; requires sophisticated model control | Complex reasoning tasks, iterative QA | |
Advanced Architecture | Knowledge Graph | Incorporates structured knowledge graphs into RAG to enhance retrieval and reasoning | Retrieval and generation leverage entities and relationships from knowledge graphs along with text embeddings | Allows relational reasoning; improves precision | Complex graph construction and maintenance | Domain-specific expert systems, scientific knowledge bases | |
Advanced Architecture | Microsoft GraphRAG | Microsoft's advanced RAG system using knowledge graphs for improved LLM performance | Extracts entities and relationships, creates community summaries, enables global and local search | Excellent for complex multi-hop questions, provides global understanding | High computational cost, complex implementation | Enterprise knowledge management, complex document analysis, research assistance | |
Advanced Architecture | RAPTOR | Recursive Abstractive Processing for Tree-Organized Retrieval using hierarchical structure | Creates tree structure with summaries at different levels, organizes information hierarchically | Scalable for large documents, provides multi-level abstraction | Complex tree construction, requires summarization at each level | Large document processing, hierarchical knowledge organization, scalable RAG | |
Special Technique | Sophisticated Controllable Agent | Advanced RAG agent for complex questions using deterministic graph as control system | Uses sophisticated deterministic graph for highly controllable autonomous operation | Excellent for complex reasoning, highly controllable, rigorous answer verification | Very complex implementation, requires significant setup | Complex research questions, mission-critical applications, expert-level analysis |